Skip to content

test: testapp bench#3214

Merged
julienrbrt merged 6 commits intomainfrom
julien/testapp-bench
Mar 31, 2026
Merged

test: testapp bench#3214
julienrbrt merged 6 commits intomainfrom
julien/testapp-bench

Conversation

@julienrbrt
Copy link
Copy Markdown
Member

@julienrbrt julienrbrt commented Mar 30, 2026

Improve the command to stress-test evnode via testapp.

Summary by CodeRabbit

  • New Features

    • Added /stats HTTP endpoint exposing injected/executed tx and block metrics
    • Stress-test tool with worker-based concurrency, target-RPS, live RPS reporting, and summary output
    • Expanded runtime metrics and higher mempool capacity for the KV service
    • Updated runtime defaults and included an example passphrase for quick setup
  • Documentation

    • Revamped testapp README and bench docs with a 3-terminal quick start, CLI subcommands/flags, stress-test examples, and HTTP endpoint details

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c311a4a5-c837-4934-81a5-b83f6e014156

📥 Commits

Reviewing files that changed from the base of the PR and between b8dc969 and 544c67b.

📒 Files selected for processing (1)
  • apps/testapp/cmd/init.go

📝 Walkthrough

Walkthrough

Adds a KV rollup reference implementation: new stress-test client, /stats HTTP endpoint, executor metrics and GetStats, increased mempool and sequencer sizing, updated init defaults, and comprehensive README/CLI documentation for running and stressing the KV server.

Changes

Cohort / File(s) Summary
Documentation
apps/testapp/README.md, apps/testapp/kv/bench/README.md
Rewrote docs to a KV rollup reference. Added 3-terminal quick start, build/run steps, CLI flag tables, stress-test usage, and HTTP endpoints documentation (/tx, /kv, /store, /stats).
Init & Examples
apps/testapp/cmd/init.go, apps/testapp/examples/passphrase.txt
Set cfg.Node.BlockTime = 10ms via rollconf.DurationWrapper and changed ChainIDFlag default to ev-test when unset. Added example passphrase file foo:bar.
Stress Test Tool
apps/testapp/kv/bench/main.go
Replaced interval-based benchmark with a worker-pool stress tester: -workers, -target-rps, persistent TCP HTTP requests, per-second progress and peak RPS, /stats sampling, and final summary reporting.
HTTP Server & Tests
apps/testapp/kv/http_server.go, apps/testapp/kv/http_server_test.go
Added injectedTxs atomic counter and GET /stats handler returning JSON with injected_txs, executed_txs, blocks_produced. handleTx increments injected counter. Minor test payload construction change.
Executor & Metrics
apps/testapp/kv/kvexecutor.go
Increased mempool buffer to 100_000_000. Added atomic counters (blocksProduced, totalExecutedTxs), ExecutorStats type and GetStats() method; incremented stats on commit. Optimized string/byte handling.
Runtime Param
apps/testapp/cmd/run.go
Adjusted sequencer construction numeric parameter from 1000 to 1_000_000.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Stress as Stress-Test
participant HTTP as testapp HTTPServer
participant Exec as KV Executor
participant Store as PersistentStore

Stress->>HTTP: POST /tx (raw HTTP over TCP)
HTTP->>Exec: executor.InjectTx(tx)
Exec->>Exec: enqueue to mempool / ExecuteTxs -> Commit
Exec->>Store: persist block / SetFinal
Exec-->>HTTP: acceptance (202)
HTTP-->>Stress: 202 Accepted
Stress->>HTTP: GET /stats (periodic)
HTTP->>Exec: executor.GetStats()
Exec-->>HTTP: {executed_txs, blocks_produced}
HTTP-->>Stress: {injected_txs, executed_txs, blocks_produced}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • test: testapp bench #3214 — Implements coordinated testapp changes: stress-test tooling, /stats endpoint, executor counters/GetStats, bench/client updates.

Suggested reviewers

  • tac0turtle
  • randygrok
  • Manav-Aggarwal

Poem

🐰 I nibble bytes and race the ticking clock,
Workers scurry, sending txs non-stop,
I poke /tx doors, then peek at /stats bright,
Blocks pile like carrots in the evening light,
Hop on — the KV hums through the night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is brief but vague; it states the goal without explaining the implementation details, changes made, or technical rationale behind the improvements. Expand the description to explain what was changed (stress-test tool rewrite, KV server enhancements, queue size increases) and why these changes improve testapp's stress-testing capabilities.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: testapp bench' directly reflects the PR's core objective of improving stress-testing capabilities for testapp, following semantic commit conventions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch julien/testapp-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 31, 2026, 12:13 PM

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude finished @julienrbrt's task in 1m 36s —— View job


Code Review: testapp bench improvements

  • Read AGENTS.md for guidelines
  • Review changed files
  • Provide comprehensive code review

This PR improves testapp into a useful KV rollup stress-test harness. The core direction is solid, but there are several correctness and resource issues worth addressing in follow-ups.


Critical Issues

1. Channel buffer size will OOM the process (kvexecutor.go:25)

txChannelBufferSize = 100_000_000

A chan []byte in Go uses ~24 bytes per slot (slice header on 64-bit). With 100M slots, this pre-allocates ~2.4 GB at startup before any payload bytes are stored. This will OOM most systems. The comment says "buffer size for the transaction channel" but doesn't explain why 100M is necessary. For a stress test that fills txs as fast as possible, a much smaller queue (e.g. 100K-1M) with backpressure is appropriate.

Fix this →


Correctness Issues

2. injectedTxs counter is inaccurate under load (http_server.go:111-113)

hs.executor.InjectTx(body)
hs.injectedTxs.Add(1)   // ← incremented even if InjectTx silently dropped the tx
w.WriteHeader(http.StatusAccepted)

InjectTx does a non-blocking channel send and silently drops the tx when the channel is full. The counter and 202 response both lie to the caller about whether the tx was accepted. The /stats output will over-report injections, making the summary metrics untrustworthy under saturation. InjectTx should return a bool/error indicating success, and the handler should only count and 202 on success (otherwise 429).

Fix this →

3. fetchStats silently returns zeros on error, causing uint64 underflow (bench/main.go:173-181)

func fetchStats(addr string) serverStats {
    resp, err := http.Get("http://" + addr + "/stats")
    if err != nil {
        return serverStats{}  // ← all zeros
    }
    // ...
    json.NewDecoder(resp.Body).Decode(&s)  // error ignored
    return s
}

If the pre-test before snapshot succeeds and the post-test after snapshot fails (or vice versa), after.BlocksProduced - before.BlocksProduced underflows as uint64(0) - uint64(N) = a huge positive number. The summary will print nonsensical block/tx counts. Return (serverStats, error) and skip delta calculations when a snapshot is unavailable.

Fix this →

4. TCP workers can block indefinitely (bench/main.go:121-149)

net.DialTimeout limits only the connection setup. After that, conn.Write(rawReq) and http.ReadResponse(br, nil) have no deadlines and can block forever if the server stalls. Workers will never check ctx.Done() while stuck in these calls, and the main goroutine will hang indefinitely waiting for done channel drain.

Use net.Dialer.DialContext to respect context cancellation during dial, and call conn.SetDeadline(time.Now().Add(timeout)) before each write/read.

Fix this →


Minor Issues

5. blocksProduced counter is incremented in ExecuteTxs, not per block (kvexecutor.go:310)

blocksProduced is incremented every time ExecuteTxs is called, but this doesn't account for rollbacks. After a Rollback(height) call, the counter stays elevated even though blocks were reverted. For a stress test this is probably fine, but the naming is misleading — it's "blocks committed" not "blocks produced".

6. ExecutorStats / GetStats() should be unexported (kvexecutor.go:45-55)

These are only consumed within the same executor package (in http_server.go). Exporting them unnecessarily expands the package API surface and requires Go doc comments per the project guidelines.

7. -target-rps flag doesn't enforce the rate (bench/main.go:28)

The flag is only used to print a "REACHED"/"NOT REACHED" banner at the end. Workers send as fast as possible regardless of the target. The flag name implies rate control. Consider renaming to -target-rps-goal or -rps-goal (or implement actual pacing) to avoid confusion.

8. init.go unconditionally overwrites block time (cmd/init.go:38)

cfg.Node.BlockTime = rollconf.DurationWrapper{Duration: 10 * time.Millisecond}

This line runs after rollconf.Load(cmd), overwriting any --evnode.node.block_time the user passed. The flag default should be set to 10ms instead, or the assignment should only apply when the flag was not explicitly set.

9. README startup command doesn't match build output (kv/bench/README.md:17)

Terminal 1 shows ./build/testapp start but the build command in this README produces ./stress-test. The parent apps/testapp/README.md builds to ./testapp. The paths are inconsistent; Terminal 1 should use ./testapp start --kv-endpoint localhost:9090.

10. Hardcoded sequencer queue cap (cmd/run.go:146)

1_000_000,  // maxQueueSize

Changed from 1_000 to 1_000_000 — but this is unclamped and paired with the 100M channel issue above. Both values together can lead to severe memory pressure. Making this configurable (or at least documenting the memory implications) would help operators tune for their environment.


Summary

The stress-test tooling direction is good and the worker-pool architecture is sound. The most urgent issues to address are the OOM-risk channel buffer size (#1), the inaccurate injectedTxs counter (#2), and the uint64 underflow in fetchStats (#3). The missing TCP deadlines (#4) are also a reliability concern in production stress runs.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.41%. Comparing base (60c68ea) to head (544c67b).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3214   +/-   ##
=======================================
  Coverage   61.41%   61.41%           
=======================================
  Files         120      120           
  Lines       12474    12474           
=======================================
  Hits         7661     7661           
  Misses       3953     3953           
  Partials      860      860           
Flag Coverage Δ
combined 61.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review March 30, 2026 21:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
apps/testapp/kv/kvexecutor.go (1)

45-54: Unexport ExecutorStats and GetStats() since they're package-private.

ExecutorStats and GetStats() are only consumed internally in http_server.go within the same package. Unexporting them to executorStats and getStats() keeps the API surface minimal. If they need to remain exported, they require Go doc comments per the coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/kvexecutor.go` around lines 45 - 54, Rename the exported
types and methods to unexported identifiers: change ExecutorStats to
executorStats and GetStats() to getStats() in KVExecutor, update the return type
and constructor usages accordingly, and update all internal references (e.g., in
http_server.go) to use executorStats and getStats(); ensure method receiver
signature remains the same (func (k *KVExecutor) getStats() executorStats) and
run `go build`/tests to catch any missed references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/testapp/cmd/init.go`:
- Line 38: The init command currently unconditionally sets cfg.Node.BlockTime =
rollconf.DurationWrapper{Duration: 100 * time.Millisecond}, overwriting any
value rollconf.Load(cmd) parsed (e.g., --evnode.node.block_time=1s); change this
so you only apply 100ms when the user did not supply the flag or instead set the
flag's default to 100ms. Locate the init command logic around rollconf.Load(cmd)
and cfg.Node.BlockTime and either (a) detect whether the flag was provided (use
the command/flag API to check if evnode.node.block_time was set) and only assign
the 100ms default when not provided, or (b) set the CLI flag default for
evnode.node.block_time to 100ms and remove the unconditional assignment to
cfg.Node.BlockTime.

In `@apps/testapp/kv/bench/main.go`:
- Around line 173-181: The fetchStats function currently swallows HTTP/JSON
errors and returns an all-zero serverStats which can cause underflow when
callers subtract uint64 deltas; change fetchStats to return (serverStats, error)
instead of serverStats, propagate errors from http.Get and json.Decode
(including non-200 responses), and have callers check the error and skip
server-side delta calculations/reporting when a snapshot cannot be trusted (use
the presence of an error to avoid subtracting snapshots or to re-sample). Ensure
you update the signature and all call sites accordingly (references: fetchStats,
serverStats).
- Line 28: The flag targetRPS (variable targetRPS) is only reported and not
enforced; either implement pacing or rename it to avoid confusion — to implement
pacing, add a token-bucket or ticker driven rate limiter in the request send
path (e.g., inside the send loop function used to launch workers / sendRequests)
that consumes tokens at the rate specified by *targetRPS and blocks or sleeps
when tokens are exhausted, or alternatively rename the flag (and all uses in
banner/logging) to something like target-rps-goal or desired-rps to make it
clear it’s only informational; update the worker launch/send loop (the function
that iterates and issues requests referenced at lines 56-57 and 103-108) to
either use the limiter or to remove/rename the misleading references so behavior
matches the flag name.
- Around line 121-149: The worker currently uses net.DialTimeout and then
performs conn.Write and http.ReadResponse without timeouts, which can block
forever; replace the dial with a context-aware dial (use net.Dialer.DialContext
with ctx) and before each network operation set per-operation deadlines on conn
(via conn.SetDeadline(time.Now().Add(timeout))) so the conn.Write(rawReq) and
http.ReadResponse(br, nil) calls will time out and return errors that you handle
by calling failures.Add(1), closing conn, and breaking the loop; ensure the
ctx.Done() check remains and choose a sensible timeout value (e.g., a short
per-op timeout) so workers can exit promptly on server stalls or context
cancellation.

In `@apps/testapp/kv/bench/README.md`:
- Around line 16-20: The README uses an incorrect startup command: it shows
./build/testapp but the build artifacts are ./stress-test and ./testapp; update
the Terminal 1 command to start the built testapp binary (use ./testapp start
--kv-endpoint localhost:9090) so it matches the documented build output and the
Terminal 2 stress-test command; ensure the symbols shown (./build/testapp,
./testapp, ./stress-test, --kv-endpoint) are updated consistently in the README
entry.

In `@apps/testapp/kv/http_server.go`:
- Line 112: The injectedTxs counter is incremented unconditionally at
hs.injectedTxs.Add(1) even when KVExecutor.InjectTx drops transactions; change
the flow so InjectTx (in apps/testapp/kv/kvexecutor.go) returns a success
boolean or error indicating whether the tx was enqueued, then in the HTTP
handler only call hs.injectedTxs.Add(1) and return 202 when InjectTx reports
success—otherwise return an error status (e.g., 429/500) and do not increment
injectedTxs so dropped transactions are not counted.

In `@apps/testapp/kv/kvexecutor.go`:
- Line 25: The txChannelBufferSize constant is far too large and will
preallocate gigabytes when used in make(chan []byte, txChannelBufferSize) inside
NewKVExecutor; change txChannelBufferSize to a much smaller default (e.g., 1_000
or similar) and make it configurable (via constructor param or config/env) so
callers can tune it, or replace the large buffered channel with a bounded/ring
buffer implementation or an unbuffered channel plus worker backpressure; update
NewKVExecutor to accept the buffer size parameter (or config) and use it instead
of the hardcoded 100_000_000, and add a comment about memory implications.

---

Nitpick comments:
In `@apps/testapp/kv/kvexecutor.go`:
- Around line 45-54: Rename the exported types and methods to unexported
identifiers: change ExecutorStats to executorStats and GetStats() to getStats()
in KVExecutor, update the return type and constructor usages accordingly, and
update all internal references (e.g., in http_server.go) to use executorStats
and getStats(); ensure method receiver signature remains the same (func (k
*KVExecutor) getStats() executorStats) and run `go build`/tests to catch any
missed references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5360d9f5-19ba-4c64-bdd9-9121413b1041

📥 Commits

Reviewing files that changed from the base of the PR and between 8d68f9d and 5581431.

📒 Files selected for processing (8)
  • apps/testapp/README.md
  • apps/testapp/cmd/init.go
  • apps/testapp/examples/passphrase.txt
  • apps/testapp/kv/bench/README.md
  • apps/testapp/kv/bench/main.go
  • apps/testapp/kv/http_server.go
  • apps/testapp/kv/http_server_test.go
  • apps/testapp/kv/kvexecutor.go

// we use load in order to parse all the flags
cfg, _ := rollconf.Load(cmd)
cfg.Node.Aggregator = aggregator
cfg.Node.BlockTime = rollconf.DurationWrapper{Duration: 100 * time.Millisecond}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't overwrite the parsed block-time value.

Line 38 clobbers whatever rollconf.Load(cmd) just read, so testapp init --evnode.node.block_time=1s still writes 100ms. If the goal is a faster default for this command, set the flag default or only apply 100ms when the user did not pass the flag; otherwise this also drifts from apps/testapp/cmd/init_test.go, which still expects block_time: 1s.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/cmd/init.go` at line 38, The init command currently
unconditionally sets cfg.Node.BlockTime = rollconf.DurationWrapper{Duration: 100
* time.Millisecond}, overwriting any value rollconf.Load(cmd) parsed (e.g.,
--evnode.node.block_time=1s); change this so you only apply 100ms when the user
did not supply the flag or instead set the flag's default to 100ms. Locate the
init command logic around rollconf.Load(cmd) and cfg.Node.BlockTime and either
(a) detect whether the flag was provided (use the command/flag API to check if
evnode.node.block_time was set) and only assign the 100ms default when not
provided, or (b) set the CLI flag default for evnode.node.block_time to 100ms
and remove the unconditional assignment to cfg.Node.BlockTime.

addr := flag.String("addr", "localhost:9090", "server host:port")
duration := flag.Duration("duration", 10*time.Second, "test duration")
workers := flag.Int("workers", 1000, "concurrent workers")
targetRPS := flag.Uint64("target-rps", 10_000_000, "target requests per second")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

-target-rps is only reported, not enforced.

The new flag looks like a rate-control knob, but nothing in the worker launch or send loop uses it to pace requests. Right now it only changes the pass/fail banner, so runs with -target-rps=100 and -target-rps=10000000 generate the same load. Either add pacing/token-bucket logic or rename the flag so it is clearly just a goal.

Also applies to: 56-57, 103-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` at line 28, The flag targetRPS (variable
targetRPS) is only reported and not enforced; either implement pacing or rename
it to avoid confusion — to implement pacing, add a token-bucket or ticker driven
rate limiter in the request send path (e.g., inside the send loop function used
to launch workers / sendRequests) that consumes tokens at the rate specified by
*targetRPS and blocks or sleeps when tokens are exhausted, or alternatively
rename the flag (and all uses in banner/logging) to something like
target-rps-goal or desired-rps to make it clear it’s only informational; update
the worker launch/send loop (the function that iterates and issues requests
referenced at lines 56-57 and 103-108) to either use the limiter or to
remove/rename the misleading references so behavior matches the flag name.

Comment on lines +121 to +149
conn, err := net.DialTimeout("tcp", addr, time.Second)
if err != nil {
failures.Add(1)
continue
}

br := bufio.NewReaderSize(conn, 512)

for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}

return
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}

case <-ticker.C:
// Send a batch of transactions
for i := 0; i < txPerInterval; i++ {
wg.Add(1)
go func() {
defer wg.Done()
var currentTxData string
// Generate random key-value pair
key := randomString(8)
value := randomString(16)
currentTxData = fmt.Sprintf("%s=%s", key, value)
success := sendTransaction(url, currentTxData)
if success {
atomic.AddUint64(&successCount, 1)
} else {
atomic.AddUint64(&failureCount, 1)
}
}()
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n apps/testapp/kv/bench/main.go | head -200

Repository: evstack/ev-node

Length of output: 5905


Set deadlines on TCP operations to prevent workers from blocking indefinitely.

Line 121 only limits the connection establishment time. After that, conn.Write and http.ReadResponse have no timeouts and can block forever. If the server is slow or stalled, workers get stuck in these operations, never checking ctx.Done(), and the main goroutine hangs waiting for them to drain the done channel.

Use DialContext to respect context cancellation during connection, and set per-operation deadlines with SetDeadline so Write and ReadResponse calls are bounded:

Suggested direction
-		conn, err := net.DialTimeout("tcp", addr, time.Second)
+		d := net.Dialer{Timeout: time.Second}
+		conn, err := d.DialContext(ctx, "tcp", addr)
 		if err != nil {
 			failures.Add(1)
 			continue
 		}
@@
-			if _, err := conn.Write(rawReq); err != nil {
+			if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
+				failures.Add(1)
+				conn.Close()
+				break
+			}
+			if _, err := conn.Write(rawReq); err != nil {
 				failures.Add(1)
 				conn.Close()
 				break
 			}
 
 			resp, err := http.ReadResponse(br, nil)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
conn, err := net.DialTimeout("tcp", addr, time.Second)
if err != nil {
failures.Add(1)
continue
}
br := bufio.NewReaderSize(conn, 512)
for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}
return
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}
case <-ticker.C:
// Send a batch of transactions
for i := 0; i < txPerInterval; i++ {
wg.Add(1)
go func() {
defer wg.Done()
var currentTxData string
// Generate random key-value pair
key := randomString(8)
value := randomString(16)
currentTxData = fmt.Sprintf("%s=%s", key, value)
success := sendTransaction(url, currentTxData)
if success {
atomic.AddUint64(&successCount, 1)
} else {
atomic.AddUint64(&failureCount, 1)
}
}()
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
d := net.Dialer{Timeout: time.Second}
conn, err := d.DialContext(ctx, "tcp", addr)
if err != nil {
failures.Add(1)
continue
}
br := bufio.NewReaderSize(conn, 512)
for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}
if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
failures.Add(1)
conn.Close()
break
}
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` around lines 121 - 149, The worker currently
uses net.DialTimeout and then performs conn.Write and http.ReadResponse without
timeouts, which can block forever; replace the dial with a context-aware dial
(use net.Dialer.DialContext with ctx) and before each network operation set
per-operation deadlines on conn (via conn.SetDeadline(time.Now().Add(timeout)))
so the conn.Write(rawReq) and http.ReadResponse(br, nil) calls will time out and
return errors that you handle by calling failures.Add(1), closing conn, and
breaking the loop; ensure the ctx.Done() check remains and choose a sensible
timeout value (e.g., a short per-op timeout) so workers can exit promptly on
server stalls or context cancellation.

Comment on lines +173 to +181
func fetchStats(addr string) serverStats {
resp, err := http.Get("http://" + addr + "/stats")
if err != nil {
return serverStats{}
}
defer resp.Body.Close()
var s serverStats
json.NewDecoder(resp.Body).Decode(&s)
return s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't collapse /stats failures into zero counters.

Line 176 turns any fetch failure into an all-zero snapshot, and the caller later subtracts those snapshots as uint64. If the initial sample succeeds and the final sample fails, or the server restarts between samples, the delta math underflows and the summary prints enormous bogus block/tx counts. Return an error from fetchStats and skip server-side delta reporting when a snapshot cannot be trusted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` around lines 173 - 181, The fetchStats
function currently swallows HTTP/JSON errors and returns an all-zero serverStats
which can cause underflow when callers subtract uint64 deltas; change fetchStats
to return (serverStats, error) instead of serverStats, propagate errors from
http.Get and json.Decode (including non-200 responses), and have callers check
the error and skip server-side delta calculations/reporting when a snapshot
cannot be trusted (use the presence of an error to avoid subtracting snapshots
or to re-sample). Ensure you update the signature and all call sites accordingly
(references: fetchStats, serverStats).

Comment on lines +16 to +20
# Terminal 1: start the testapp with KV endpoint
./build/testapp start --kv-endpoint localhost:9090

### List all key-value pairs in the store

```bash
./txclient -list [-addr <url>]
# Terminal 2: run the stress test
./stress-test --addr localhost:9090 --duration 10s --workers 1000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The testapp startup command doesn't match the documented build output.

Line 17 runs ./build/testapp, but the documented build commands produce ./stress-test here and ./testapp in apps/testapp/README.md. Following the steps as written will not create ./build/testapp.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/README.md` around lines 16 - 20, The README uses an
incorrect startup command: it shows ./build/testapp but the build artifacts are
./stress-test and ./testapp; update the Terminal 1 command to start the built
testapp binary (use ./testapp start --kv-endpoint localhost:9090) so it matches
the documented build output and the Terminal 2 stress-test command; ensure the
symbols shown (./build/testapp, ./testapp, ./stress-test, --kv-endpoint) are
updated consistently in the README entry.

}

hs.executor.InjectTx(body)
hs.injectedTxs.Add(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't count dropped transactions as injected.

KVExecutor.InjectTx in apps/testapp/kv/kvexecutor.go still drops when txChan is full, but Line 112 increments injectedTxs unconditionally and Line 113 still returns 202. Under load, /stats will over-report injections and callers will see success for work that was discarded. Have InjectTx return a bool/error and only increment/respond 202 on a successful enqueue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/http_server.go` at line 112, The injectedTxs counter is
incremented unconditionally at hs.injectedTxs.Add(1) even when
KVExecutor.InjectTx drops transactions; change the flow so InjectTx (in
apps/testapp/kv/kvexecutor.go) returns a success boolean or error indicating
whether the tx was enqueued, then in the HTTP handler only call
hs.injectedTxs.Add(1) and return 202 when InjectTx reports success—otherwise
return an error status (e.g., 429/500) and do not increment injectedTxs so
dropped transactions are not counted.

finalizedHeightKey = ds.NewKey("/finalizedHeight")
// Define a buffer size for the transaction channel
txChannelBufferSize = 10000
txChannelBufferSize = 100_000_000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This mempool size will allocate gigabytes at startup.

Line 25 drives make(chan []byte, txChannelBufferSize) in NewKVExecutor. With 100_000_000 slots, a buffered chan []byte needs roughly 24 bytes per slot just for slice headers on 64-bit Go, so the queue alone is already well over 2 GB before any payload bytes are stored. That is likely to OOM the process or starve everything else of memory. As per coding guidelines, "Consider memory allocation in hot paths of Go code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/kvexecutor.go` at line 25, The txChannelBufferSize constant
is far too large and will preallocate gigabytes when used in make(chan []byte,
txChannelBufferSize) inside NewKVExecutor; change txChannelBufferSize to a much
smaller default (e.g., 1_000 or similar) and make it configurable (via
constructor param or config/env) so callers can tune it, or replace the large
buffered channel with a bounded/ring buffer implementation or an unbuffered
channel plus worker backpressure; update NewKVExecutor to accept the buffer size
parameter (or config) and use it instead of the hardcoded 100_000_000, and add a
comment about memory implications.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/testapp/cmd/run.go (1)

140-149: ⚠️ Potential issue | 🟠 Major

Avoid hardcoding a 1,000,000 sequencer queue cap.

Line 146 sets an extremely high maxQueueSize with no local validation/clamping. Given BatchQueue grows via append and Prepend bypasses the cap, this can cause severe memory pressure/OOM during stress or recovery paths. Please make this value configurable and enforce a bounded limit before passing it to single.NewSequencer.

Suggested direction
-	sequencer, err := single.NewSequencer(
+	// TODO: source from config/flag with validation instead of hardcoding.
+	maxQueueSize := 10_000
+	sequencer, err := single.NewSequencer(
 		logger,
 		datastore,
 		daClient,
 		nodeConfig,
 		[]byte(genesis.ChainID),
-		1_000_000,
+		maxQueueSize,
 		genesis,
 		executor,
 	)

As per coding guidelines, "Consider memory allocation in hot paths of Go code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/cmd/run.go` around lines 140 - 149, The call to
single.NewSequencer currently passes a hardcoded maxQueueSize (1_000_000) which
risks unbounded memory use; make this value configurable (e.g., add a CLI flag
or a field on nodeConfig such as MaxSequencerQueueSize) and validate/clamp it to
a safe upper bound (e.g., MAX_ALLOWED_QUEUE_SIZE constant) before calling
single.NewSequencer in run.go; ensure the clamping logic is applied where
sequencer is constructed so both normal append and any Prepend/BatchQueue paths
cannot exceed the enforced limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/testapp/cmd/run.go`:
- Around line 140-149: The call to single.NewSequencer currently passes a
hardcoded maxQueueSize (1_000_000) which risks unbounded memory use; make this
value configurable (e.g., add a CLI flag or a field on nodeConfig such as
MaxSequencerQueueSize) and validate/clamp it to a safe upper bound (e.g.,
MAX_ALLOWED_QUEUE_SIZE constant) before calling single.NewSequencer in run.go;
ensure the clamping logic is applied where sequencer is constructed so both
normal append and any Prepend/BatchQueue paths cannot exceed the enforced limit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30c00489-4bfe-407d-8107-9189f8023f9d

📥 Commits

Reviewing files that changed from the base of the PR and between 5581431 and b8dc969.

📒 Files selected for processing (1)
  • apps/testapp/cmd/run.go

@@ -0,0 +1 @@
foo:bar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make running the command faster. I can just remove it from being committed

| `-addr` | `localhost:9090` | Server host:port |
| `-duration` | `10s` | Test duration |
| `-workers` | `1000` | Number of concurrent TCP workers |
| `-target-rps` | `10000000` | Target requests per second (goal) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this transaction submission or requests on the api?

Copy link
Copy Markdown
Member Author

@julienrbrt julienrbrt Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is tx execution, how fast the kv executor is able to process txs.
maybe let's clarify what we indeed mean by req/s

@julienrbrt julienrbrt marked this pull request as draft March 31, 2026 09:06
@julienrbrt julienrbrt marked this pull request as ready for review March 31, 2026 11:40
@julienrbrt julienrbrt merged commit f40cd4d into main Mar 31, 2026
14 of 18 checks passed
@julienrbrt julienrbrt deleted the julien/testapp-bench branch March 31, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants